-
Notifications
You must be signed in to change notification settings - Fork 0
DITTO Clean Up and Local Prediction instructions #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…dating documentation to reflect new path
…ctions and updated the .gitignore to ignore the work_dir for temporary nextflow runs
wilkb777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor change to make testing the pipeline better/easier (requiring no file modifications out of the box) and corresponding changes to documentation.
pipeline.nf
Outdated
| workflow { | ||
|
|
||
| // Define input channels for the VCF files | ||
| vcfFile = Channel.fromPath(params.sample_sheet).splitCsv(header: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| vcfFile = Channel.fromPath(params.sample_sheet).splitCsv(header: false) | |
| vcfFile = Channel.fromPath(params.sample_sheet) | |
| .splitText() // Emit each line as a separate item | |
| .map { line -> | |
| // For each line (relative path), create a Nextflow file object relative to params.data_dir | |
| if (line.startsWith("/")){ | |
| return line.trim() | |
| } else { | |
| def abs_path = file(workflow.launchDir).resolve(line.trim()) | |
| return abs_path | |
| } | |
| } | |
| .map { path_obj -> | |
| // Ensure the path is a proper Path object for staging | |
| file(path_obj, checkIfExists: true) | |
| } |
This will enable relative pathing for file paths (relative to the launch directory, which is the directory the pipeline.nf workflow was run from) specified in the input sample sheet text files. This will allow you to delete the bit in the README instructions about having to specify full paths for inputs (which cleans up testing nicely). I've already tested that this works locally and on Cheaha (assuming run sbatch from the repos root directory) so this can be directly added in with no further review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! This will simplify the process further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the README.md to reflect this change. Now supports absolute and relative pathing for vcf.gz files in the file list.
README.md
Outdated
| - Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the | ||
| command below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Update the `.test_data/file_list.txt` (inout vcfs) files with complete file paths and submit a slurm job using the | |
| command below | |
| - Create a text file listing the path to VCF file(s) (1 path per line) with variants to score | |
| - Paths can be full absolute paths **or** relative paths (relative to the directory where the pipeline will be run from, **not** the directory where the `pipeline.nf` file is) | |
| - See the example input file [.test_data/file_list.txt](.test_data/file_list.txt) (lists 2 testing example input VCFs) | |
| for reference or as an input file for testing (default behavior of `model.job`) |
updated this text to be a bit more explicit and clear on what to do for input (see suggestion on supporting relative pathing for input files in pipeline.nf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the README.md, provided real examples of the relative/absolute pathing in addition to this clarification.
Co-authored-by: Brandon M Wilk <[email protected]>
Co-authored-by: Brandon M Wilk <[email protected]>
Co-authored-by: Brandon M Wilk <[email protected]>
Co-authored-by: Brandon M Wilk <[email protected]>
sdhutchins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are mainly suggestions and not requirements. This is good to go after @wilkb777's suggested changes go in.
…ded names to other environments
…ile_list.txt for pathing on vcfs for DITTO; Updated README to discuss the use of Mamba for NextFlow
…throwing errors. We suspect the version it pulled automatically was too new for the pipeline
Please check if the PR fulfills these requirements
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
.gitignoreto not commit DITTO output or work done by NextFlowREADME.mdto include new instructions, requirements, document linking, and notes.configsubfolders per service and updated code to reflect new pathspython=3.10dependency for theconfigs/conda/open-cravat.ymlconda environmentconfigs/nextflow/local.configfile to allow NextFlow to use Anaconda when running DITTO locally on a deviceshap_plotsdirectory intodocsdirectorypipeline.nfto make the designated output directory along with the parent folders$PWD/data/outputmodel.jobfile to output DITTO scores to$PWD/data/outputwhen finishedWhat is the current behavior? (You can also link to an open issue here)
DITTO is difficult to run and this should help streamline the process.
What is the new behavior (if this is a feature change)?
This update is intended to make DITTO more user friendly and easier to run.
Does this PR introduce a breaking change?
N/A
To Review:
local-predictionbranchHPC Prediction with CheahaorLocal Predictioninstructions and notes belowHPC Prediction with Cheaha
HPC Prediction with CheahaIn the root DITTO folder, run
tail -f DITTO_logs.outto see the outputLocal Prediction Notes:
Reviewer will need the local device to have access to all the OpenCravat annotators, this will be roughly 600GB in disk space. Contact PR assignee for an external drive containing the data if you would like to test this route.
It looks like OpenCravat conda environment needs to be created and the module path set before DITTO can run. Even though it's in the NextFlow pipeline to set the path, it doesn't seem to do it.
conda create --name opencravat-envconda activate opencravat-envconda env update -n opencravat-env --file ./configs/conda/open-cravat.yamloc config md /Volumes/my_book/opencravat/modulesconda deactivateLocal PredictionLocal Prediction
.test_data/file_list.txtRunning DITTO locallyDITTO completing locallyHPC Prediction with Cheaha
.test_data/file_list.txtRunning DITTO with HPCDITTO completing on Cheaha